Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolve and document most common erasure coded pool pain points #3194

Merged
merged 5 commits into from Jan 18, 2015
Merged

resolve and document most common erasure coded pool pain points #3194

merged 5 commits into from Jan 18, 2015

Conversation

ghost
Copy link

@ghost ghost commented Dec 17, 2014

No description provided.

@ghost ghost added bug-fix core labels Dec 17, 2014
@loic-bot
Copy link

SUCCESS: make check on fc53dc9 output is http://paste.ubuntu.com/9551698/

:octocat: Sent from GH.

@ghost
Copy link
Author

ghost commented Dec 17, 2014

Documentation part review by Italo Santos okdokk@gmail.com

@loic-bot
Copy link

SUCCESS: make check on b47b333 output is http://paste.ubuntu.com/9553221/

:octocat: Sent from GH.

@loic-bot
Copy link

SUCCESS: make check on 4c05213 output is http://paste.ubuntu.com/9554907/

:octocat: Sent from GH.

@ghost ghost changed the title resolve and document most common erasure coded pool pain points DNM: resolve and document most common erasure coded pool pain points Dec 19, 2014
@ghost
Copy link
Author

ghost commented Jan 6, 2015

rebased and repushed

@loic-bot
Copy link

loic-bot commented Jan 6, 2015

FAIL: the output of run-make-check.sh on 4df7a46 is http://paste.pound-python.org/show/EwJFIOR3jzfcKYk3ArDW/

:octocat: Sent from GH.

@loic-bot
Copy link

loic-bot commented Jan 6, 2015

SUCCESS: the output of run-make-check.sh on c3edf67 is http://paste.pound-python.org/show/RcxenPns9Pdz28O1g1iB/

:octocat: Sent from GH.

@ghost
Copy link
Author

ghost commented Jan 6, 2015

running in gitbuilder

@ghost ghost changed the title DNM: resolve and document most common erasure coded pool pain points resolve and document most common erasure coded pool pain points Jan 8, 2015
@ghost ghost assigned liewegas Jan 8, 2015
http://tracker.ceph.com/issues/10349 Fixes: #10349

Signed-off-by: Loic Dachary <ldachary@redhat.com>
It is common for people to try to map 9 OSDs out of a 9 OSDs total ceph
cluster. The default tries (50) will frequently lead to bad mappings for
this use case. Changing it to 100 makes no significant CPU performance
difference, as tested manually by running crushtool on one million
mappings.

http://tracker.ceph.com/issues/10353 Fixes: #10353

Signed-off-by: Loic Dachary <ldachary@redhat.com>
The ruleset created for an erasure coded pool has max_size set to a
fixed value of 20, which may be incorrect when more than 20 chunks are
needed and lead to obscure errors. Set it to the number of chunks,
i.e. k+m most of the time.

In a cluster with few OSDs (9 for instance), setting max_size to 20
causes performance problems when injecting a new crushmap. The monitor
will call CrushTester::test which tries 1024 mappins for all sizes
ranging from min_size to max_size. Each attempt to map more OSDs than
available will exhaust all retries (50 by default) and it takes a
significant amount of time. In a cluster with 9 OSDs, testing one such
ruleset can take up to 5 seconds.

Since the test blocks the monitor leader, a few erasure coded rulesets
will block the monitor long enough to exceed the timeouts and trigger an
election.

http://tracker.ceph.com/issues/10363 Fixes: #10363

Signed-off-by: Loic Dachary <ldachary@redhat.com>
Add a new section to the PG troubleshooting section that covers the most
common problems reported when an erasure coded pool fails to properly
map PGs to enough OSDs.

http://tracker.ceph.com/issues/10350 Fixes: #10350

Signed-off-by: Loic Dachary <ldachary@redhat.com>
Use different erasure coded pool names and profiles to avoid deletion /
creation races. The more expensive alternative is to run a different
cluster for each test.

Signed-off-by: Loic Dachary <ldachary@redhat.com>
@ghost ghost added the needs-qa label Jan 15, 2015
@loic-bot
Copy link

SUCCESS: the output of run-make-check.sh on centos-centos7 for ac051fe is http://paste2.org/dFMbjVBs

:octocat: Sent from GH.

liewegas added a commit that referenced this pull request Jan 18, 2015
…ries

resolve and document most common erasure coded pool pain points

Documentation-Reviewed-by: Italo Santos <okdokk@gmail.com>
@liewegas liewegas merged commit 31eb4c6 into ceph:master Jan 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants